Word Segmentation in Indo-China Languages for Digital Libraries
نویسندگان
چکیده
This chapter introduces word segmentation methods for Indo-China languages. It describes six different word segmentation methods developed for the Thai, Vietnamese, and Myanmar languages and compare different approaches in terms of their algorithms and results achieved. The discussion and comparison of these word segmentation methods will provide underlying views about how word segmentation can be achieved and employed in Indo-China languages to support search functionality in digital libraries.
منابع مشابه
Cross-linguistic generalization of the distal rate effect: Speech rate in context affects whether listeners hear a function word in Chinese Mandarin
Recent findings show that altering the speech rate of the context several syllables away from a word (i.e., the distal context) can cause the word to disappear in perception in non-tonal Indo-European languages like English [1] and Russian [2]. This study investigated the distal rate effect in Chinese Mandarin, a tonal language belonging to the Sino-Tibetan language family. We examined whether ...
متن کاملAugmenting Pivot based SMT with word segmentation
This paper is an attempt to bridge two well known performance degraders in SMT, viz., (i) difference in morphological characteristics of the two languages, and (ii) scarcity of parallel corpora. We address these two problems using “word segmentation” and through “pivots” on the morphologically complex language. Our case study is Malayalam to Hindi SMT. Malayalam belongs to the Dravidian family ...
متن کاملInvestigating the Level of Observing the Evaluation Criteria for User Interface in library services providing to the blind and deaf users in the word
Purpose: Digital library user interfaces has a determining role in desirable performance of this kind of libraries. Digital Library service providers to the blind and deaf users will have their best performance when the users (deaf and blind users) could have a proper interaction with them. This study aims to evaluate and analyze the criteria related to user interface in digital libraries servi...
متن کاملThe possible-word constraint in Cantonese speech segmentation
Introduction Speech segmentation is a central issue of spoken language comprehension research (Cutler, 2001). And, recently, one important solution comes from the discovery of a mechanism operated in our lexical system, the Possible-Word Constraint (PWC). In their word-spotting experiments, Norris, McQueen, Cutler and Butterfield (1997) observed that listeners usually found it more difficult to...
متن کاملUsing Interactive Search Elements in Digital Libraries
Background and Aim: Interaction in a digital library help users locating and accessing information and also assist them in creating knowledge, better perception, problem solving and recognition of dimension of resources. This paper tries to identify and introduce the components and elements that are used in interaction between user and system in search and retrieval of information in digital li...
متن کامل